Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] #193

Merged
merged 10 commits into from
Jun 19, 2024

Conversation

deldesir
Copy link
Collaborator

@deldesir deldesir commented Jun 18, 2024

Video failed due to unavailable fragments are not kept in the database. Those failed for other reasons are left but will not carry over the next download session.
 
Tested on Ubuntu 24.04 with #194
image

@deldesir deldesir self-assigned this Jun 18, 2024
@deldesir deldesir marked this pull request as draft June 18, 2024 15:34
@holta holta added bug Something isn't working enhancement New feature or request labels Jun 18, 2024
@holta
Copy link
Member

holta commented Jun 18, 2024

@codewiz can you review?

@holta
Copy link
Member

holta commented Jun 18, 2024

@deldesir can you recommend any YouTube URL (video or playlist) known to regularly pollute /library/calibre-web/xklb-metadata.db ?

(So we can all use this test case URL to confirm improved behavior!)

@deldesir
Copy link
Collaborator Author

deldesir commented Jun 18, 2024

Pollution and "carrying over" are 2 different things. The former happens when a downloading session is cancelled prematurely. For example, if you click on "cancel" button in the video row (Tasks page) or if you restart Calibre-Web while a playlist download is in progress. The latter happens in the case of failures due to unavailable fragments or because the video is/was a live.

To answer your question, just restart Calibre-Web or click on cancel before the download task kicks in.

@holta
Copy link
Member

holta commented Jun 18, 2024

Thanks @deldesir for explaining!

Ready to merge (PRs #193 and #194) if @codewiz agrees code is clean / safe / readable ✅

else:
log.error("No error found in the database, likely the video failed due to unavailable fragments.")
self.message = f"{self.media_url_link} failed to download: No path found in the database"
media_id = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone()[0]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one record with this webpath guaranteed to exist in the media table, even when the query at line 89 returned no records?

Since the query above has the extra condition ... AND path NOT LIKE 'http%', it's possible that this less restrictive query might return one result (or more?).

If the query fails, trying to access field [0] will throw, and you may need to add an extra check:

results = conn.execute("SELECT id FROM media WHERE webpath = ?", (self.media_url,)).fetchone()
if results:
  media_id = results[0]
  conn.execute("DELETE ...")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand your point well. I am confident one record with this webpath will always be present because it's in fact "the" trigger of the actual download task. It's mandatory. However the condition AND path NOT LIKE 'http%' might not be met under certain circumstances, i.e when the video fails to have it path updated with a path.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed how the media table is created and updated by xklb, and it seems that a record with webpath = media_url will be inserted by the lb dl child process at some point...

Could there be cases where the child process fails before updating the database?

If it's possible (even if rare), then this query will return 0 records, and this code should defensively check before trying to access the result.

Copy link
Collaborator Author

@deldesir deldesir Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The running of the child process alone attests the existence of webpath = media_url. Yes there are cases where lb dl can fail, for example live videos, videos without an available/suitable format, videos stuck due to unavailable fragments or network issues. But it will never return 0 record because it's "metadata fetch" that creates the record exists and ensures this specific record is used per

xklb_full_cmd="lb dl ${XKLB_DB_FILE} --video --search ${URL_OR_SEARCH_TERM} ${FORMAT_OPTIONS} --write-thumbnail --subs -o ${OUTTMPL} ${VERBOSITY}"
(notice the --search option)

@@ -111,6 +118,7 @@ def run(self, worker_thread):
# 2024-02-17: Dedup Design Evolving... https://github.com/iiab/calibre-web/pull/125
conn.execute("UPDATE media SET path = ? WHERE webpath = ?", (new_video_path, self.media_url))
conn.execute("UPDATE media SET webpath = ? WHERE path = ?", (f"{self.media_url}&timestamp={int(datetime.now().timestamp())}", new_video_path))
conn.commit()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this too: commit() is required if the sqlite3 connection is in autocommit=False mode, which is the recommended value.

But it seems that autocommit =False is not the current default:
https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.autocommit

No change required here, but it's probably best to set autocommit=False on all sqlite connections? Otherwise, a crash happening between these two UPDATE statements would leave the database in an inconsistent state.

I also noticed that the conn.close() at line 127 can be removed:

with sqlite3.connect(XKLB_DB_FILE) as conn:
    ...transactions done here...
    # conn is implicitly closed when leaving the with block

conn.close()  # this additional close has no effect and can be removed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No change required here, but it's probably best to set autocommit=False on all sqlite connections?

FYI, sqlite3.connect(... autocommit =False) requires Python 3.12, so it's probably best to avoid it while there are still users on older Python versions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the docs on how sqlite3.Connection objects interact with context managers, I realize that it doesn't work the same of file.open() and other simple I/O objects.

with sqlite3.connect(XKLB_DB_FILE) as conn:
    # ^-- the above context manager will implicitly start an SQL transaction
    ...queries executed here are part of the transaction...
    # The transaction is committed, equivalent to `conn.commit()`

conn.close()  # Still needed because the with block leaves the connection open

See: https://docs.python.org/3/library/sqlite3.html#how-to-use-the-connection-context-manager

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words: you don't need to add conn.commit() when you're already inside a with block and you will exit the block cleanly (e.g. without throwing an exception or crashing).

@codewiz
Copy link

codewiz commented Jun 19, 2024

All review issues addressed, merge at your convenience.

@holta
Copy link
Member

holta commented Jun 19, 2024

Let's try to add a bit more "Defensive Programming" to download.py before merging:

  • Hopefully @deldesir has time Wednesday morning early, to explain/suggest options here, also on a call if possible.

  • My personal opinion is that we should programmatically enforce assumptions (preconditions) — try hard to provide pertinent error messages and logging — even when (especially when!) lb-wrapper, xklb, xklb-metadata.db, yt-dlp (ETC) happen to be broken! 💔 🚒

cps/tasks/download.py Outdated Show resolved Hide resolved
@holta holta changed the title Prevent stuck videos from carrying over next download session Prevent stuck videos from carrying over to next download session ["unavailable fragments" ?] Jun 19, 2024
@@ -95,6 +95,12 @@ def run(self, worker_thread):
if error:
log.error("[xklb] An error occurred while trying to download %s: %s", error[1], error[0])
self.message = f"{error[1]} failed to download: {error[0]}"
else:
log.error("No error found in the database, likely the video failed due to unavailable fragments.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal should be very complete diagnostic hints in both... logs and in "Tasks" view error messages:

@deldesir: what log.error message is best?

Suggested change
log.error("No error found in the database, likely the video failed due to unavailable fragments.")
log.error("%s failed to download: No path or error found in the database (likely the video failed due to unavailable fragments?)", self.media_url_link)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best, but the use of self.media_url_link here is not appropriate in a log message. Use self.media_url instead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, go ahead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @deldesir: please enact self.media_url if that's best, and test!

@holta
Copy link
Member

holta commented Jun 19, 2024

@deldesir: Tested as safe enough to merge?

@deldesir
Copy link
Collaborator Author

deldesir commented Jun 19, 2024

Yes, tested on Ubuntu 24.04

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants